Two-step generation of variable-word-length language model integrating local and global constraints

نویسندگان

  • Shoichi Matsunaga
  • Shigeki Sagayama
چکیده

This paper proposes two-step generation of a variablelength class-based language model that integrates local and global constraints. In the rst-step, an initial class set is recursively designed using local constraints. Word elements for each class are determined using Kullback divergence and total entropy. In the second step, the word classes are recursively and words are iteratively recreated, by grouping consecutive words to generate longer units and by splitting the initial classes into ner classes. These operations in the second step are carried out selectively, taking into account local and global constraints on the basis of a minimum entropy criterion. Experiments showed that the perplexity of the proposed initial class set is superior to that of the conventional part-of-speech class, and the perplexity of the variable-word-length model consequently becomes lower. Furthermore, this two-step model generation approach greatly reduces the training time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Recognition Using a Stochastic Language Model Integrating Local and Global Constraints

In this paper, we propose a new stochastic language model that integrates local and global constraints effectively and describe a speechrecognition system basedon it. Theproposedlanguagemodel uses the dependencies within adjacent words as local constraints in the same way as conventional word N-gram models. To capture the global constraints between non-contiguous words, we take into account the...

متن کامل

A Convolutional Architecture for Word Sequence Prediction

We propose a convolutional neural network, named genCNN, for word sequence prediction. Different from previous work on neural networkbased language modeling and generation (e.g., RNN or LSTM), we choose not to greedily summarize the history of words as a fixed length vector. Instead, we use a convolutional neural network to predict the next word with the history of words of variable length. Als...

متن کامل

genCNN: A Convolutional Architecture for Word Sequence Prediction

We propose a convolutional neural network, named genCNN, for word sequence prediction. Different from previous work on neural networkbased language modeling and generation (e.g., RNN or LSTM), we choose not to greedily summarize the history of words as a fixed length vector. Instead, we use a convolutional neural network to predict the next word with the history of words of variable length. Als...

متن کامل

Multi-Span statistical language modeling for large vocabulary speech recognition

The goal of multi-span language modeling is to integrate the various constraints, both local and global, that are present in the language. In this paper, local constraints are captured via the usual n-gram approach, while global constraints are taken into account through the use of latent semantic analysis. An integrative formulation is derived for the combination of these two paradigms, result...

متن کامل

Context scope selection in multi-Span statistical language modeling

A multi-span framework was recently proposed to integrate the various constraints, both local and global, that are present in the language. In this approach, local constraints are captured via n-gram language modeling, while global constraints are taken into account through the use of latent semantic analysis. The complementarity between these two paradigms translates into improved modeling per...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998